Statistical Sense Disambiguation with Relatively Small Corpora Using Dictionary Definitions
نویسنده
چکیده
Corpus-based sense disambiguation methods, like most other statistical NLP approaches, suffer from the problem of data sparseness. In this paper, we describe an approach which overcomes this problem using dictionary definitions. Using the definitionbased conceptual co-occurrence data collected from the relatively small Brown corpus, our sense disambiguation system achieves an average accuracy comparable to human performance given the same contextual information.
منابع مشابه
Empirical Acquisition Of Differentiating Relations From Definitions
This paper describes a new automatic approach for extracting conceptual distinctions from dictionary definitions. A broad-coverage dependency parser is first used to extract the lexical relations from the definitions. Then the relations are disambiguated using associations learned from tagged corpora. This contrasts with earlier approaches using manually developed rules for disambiguation.
متن کاملAutomatic Acquisition of Sense Tagged Corpora
An important problem in Natural Language Processing is identifying thecorrect sense of a word in a particular context. Thus far, statistical methods have been considered the best techniques in word sense disambiguation. Unfortunately, these methods produce high accuracy results only for a small number of preselected words. The reduced applicability of statistical methods is due basically to the...
متن کاملWord sense disambiguation with pattern learning and automatic feature selection
This paper presents a novel approach for word sense disambiguation. The underlying algorithm has two main components: (1) pattern learning from available sense-tagged corpora (SemCor), from dictionary definitions (WordNet) and from a generated corpus (GenCor); and (2) instance based learning with automatic feature selection, when training data is available for a particular word. The ideas descr...
متن کاملConstructing Word-Sense Association Networks from Bilingual Dictionary and Comparable Corpora
A novel thesaurus named a word-sense association network is proposed for the first time. It consists of nodes representing word senses, each of which is defined as a set consisting of a word and its translation equivalents, and edges connecting topically associated word senses. This word-sense association network is produced from a bilingual dictionary and comparable corpora by means of a new...
متن کاملA 3-Steps Algorithm for Morphological Disambiguation Using Untagged Corpora
This article presents a three steps algorithm for morphological disambiguation between the definite article and the personal pronoun in French language. Tested accuracy in a large untagged corpora exceeds 98% with less than 1% of error. Our method has been also experimented on unlabeled Greek corpora and the results prove the system’s portability to other languages with similar structure. Not a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1995